1 Biomark, Inc. 705 South 8th St., Boise, Idaho, 83702, USA

Correspondence: Kevin E. See <>

Choosing Habitat Covariates

A key step in developing a QRF model to predict fish capacities was selecting the habitat covariates to include in the model. Random forest models naturally incorporate interactions between correlated covariates, which is essential since nearly all habitat variables are considered correlated to one degree or another, however, we aimed to avoid overly redundant variables (i.e., variables that measure similar aspects of the habitat). Further, including too many covariates can result in overfitting of the model (e.g., including as many covariates as data points).

To prevent overfitting, we pared down the more than 100 metrics generated by the CHaMP protocol describing the quantity and quality of fish habitat for each survey site. To assist in determining the habitat metrics to include in the QRF model, we used the Maximal Information-Based Nonparametric Exploration (MINE) class of statistics (Reshef et al. 2011) to determine those habitat characteristics (covariates) most highly associated with observed parr densities. We calculated the maximal information coefficient (MIC), using the R package minerva (Filosi et al. 2019), to measure the strength of the linear or non-linear association between two variables (Reshef et al. 2011). The MIC value between each of the measured habitat characteristics and parr density was used to inform decisions on which habitat covariates to include in the QRF parr capacity model.

Habitat metrics were first grouped into broad categories that included channel unit, complexity, cover, disturbance, riparian, size, substrate, temperature, water quality, and woody debris. Habitat metrics measuring volume or area were scaled to the wetted area of each site. Within each category, metrics were ranked according to their MIC value (Table 1 and Figure 1). Our strategy was to select one or two variables with the highest MIC score within each category so that covariates describe different aspects of rearing habitat (e.g., substrate, temperature, etc.). Additionally, we attempted to avoid covariates that were highly correlated (Figure ??) while including covariates that can be directly influenced by restoration actions or have been shown to impact salmonid juvenile density.

References

Filosi, M., R. Visintainer, and D. Albanese. 2019. Minerva: Maximal information-based nonparametric exploration for variable analysis.

Reshef, D. N., Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. 2011. Detecting novel associations in large data sets. Science 334:1518–1524.

Tables

Table 1: MIC statistic for top metrics in each habitat category. Metrics selected for the QRF model are in bold.
Category Name Abbrv MIC
ChannelUnit Channel Unit Frequency CU_Freq 0.241
ChannelUnit Fast Turbulent Frequency FstTurb_Freq 0.230
ChannelUnit Fast NonTurbulent Frequency FstNT_Freq 0.209
ChannelUnit Slow Water Frequency SlowWater_Freq 0.208
ChannelUnit Fast Turbulent Percent FstTurb_Pct 0.195
ChannelUnit ChnlUnitTotal_Ct ChnlUnitTotal_Ct 0.189
ChannelUnit Channel Unit Count CU_Ct 0.189
ChannelUnit Fast Turbulent Count FstTurb_Ct 0.178
ChannelUnit Slow Water Percent SlowWater_Pct 0.177
ChannelUnit Fast NonTurbulent Percent FstNT_Pct 0.169
Complexity Wetted Width To Depth Ratio Avg WetWDRat_Avg 0.247
Complexity Bankfull Width To Depth Ratio Avg BfWDRat_Avg 0.245
Complexity Wetted Depth SD DpthWet_SD 0.232
Complexity Wetted Channel Braidedness WetBraid 0.212
Complexity Bankfull Channel Braidedness BfBraid 0.211
Complexity Wetted Channel Qualifying Island Count Wet_QIsland_Ct 0.209
Complexity Bankfull Width CV BfWdth_CV 0.209
Complexity Bankfull Width To Depth Ratio CV BfWDRat_CV 0.202
Complexity Detrended Elevation SD DetrendElev_SD 0.196
Complexity Bankfull Channel Qualifying Island Count Bf_QIsland_Ct 0.193
Cover Fish Cover: Total FishCovTotal 0.225
Cover Fish Cover: None FishCovNone 0.224
Cover Fish Cover: LW FishCovLW 0.213
Cover Fish Cover: Terrestrial Vegetation FishCovTVeg 0.204
Cover Percent Undercut by Length UcutLgth_Pct 0.185
Cover Percent Undercut by Area UcutArea_Pct 0.184
Cover Fish Cover: Aquatic Vegetation FishCovAqVeg 0.166
Cover Fish Cover: Artificial FishCovArt 0.136
Land Classification Natural Class PCA 2 NatPrin2 0.271
Land Classification Natural Class PCA 1 NatPrin1 0.258
Land Classification Disturbance Class PCA 1 DistPrin1 0.242
Riparian Riparian Cover: Understory RipCovUstory 0.206
Riparian RipCovUstoryNone RipCovUstoryNone 0.206
Riparian Riparian Cover: No Canopy RipCovCanNone 0.194
Riparian Riparian Cover: Some Canopy RipCovCanSome 0.194
Riparian Riparian Cover: Big Tree RipCovBigTree 0.184
Riparian Riparian Cover: Ground RipCovGrnd 0.182
Riparian RipCovGrndNone RipCovGrndNone 0.170
Riparian Riparian Cover: Woody RipCovWood 0.168
Riparian Riparian Cover: Non-Woody RipCovNonWood 0.166
Riparian Riparian Cover: Coniferous RipCovConif 0.164
SideChannel Bankfull Side Channel Width BfSCWdth 0.223
SideChannel Wetted Side Channel Width WetSCWdth 0.213
SideChannel Wetted Side Channel Percent By Area WetSC_Pct 0.209
SideChannel SCSm_Freq SCSm_Freq 0.153
SideChannel SCSm_Ct SCSm_Ct 0.153
SideChannel SC_Area_Pct SC_Area_Pct 0.153
Size Mean Annual Flow MeanU 0.346
Size Wetted Width Integrated WetWdth_Int 0.332
Size Bankfull Width Integrated BfWdthInt 0.324
Size Wetted Width Avg WetWdth_Avg 0.324
Size Drainage Area (Flowline) CUMDRAINAG 0.302
Size Bankfull Width Avg BfWdth_Avg 0.298
Size DpthThlwg_Avg DpthThlwg_Avg 0.280
Size Discharge Q 0.259
Size Bankfull Depth Avg DpthBf_Avg 0.245
Size Bankfull Depth Max DpthBf_Max 0.240
Substrate Substrate < 6mm SubLT6 0.237
Substrate Substrate < 2mm SubLT2 0.227
Substrate Substrate: D16 SubD16 0.219
Substrate Substrate: Embeddedness Avg SubEmbed_Avg 0.204
Substrate Substrate: D50 SubD50 0.197
Substrate Substrate Est: Sand and Fines SubEstSandFines 0.190
Substrate Substrate Est: Cobbles SubEstCbl 0.185
Substrate Substrate: D84 SubD84 0.185
Substrate Substrate Est: Boulders SubEstBldr 0.183
Substrate Substrate: Embeddedness SD SubEmbed_SD 0.181
Temperature Avg. August Temperature avg_aug_temp 0.272
Temperature Elev_M Elev_M 0.262
Temperature August Temperature aug_temp 0.188
Temperature Solar Access: Summer Avg SolarSummr_Avg 0.186
WaterQuality Conductivity Cond 0.254
WaterQuality Alkalinity Alk 0.225
WaterQuality Drift Biomass DriftBioMass 0.000
Wood Large Wood Volume: Bankfull Slow Water LWVol_BfSlow 0.213
Wood Large Wood Volume: Wetted Slow Water LWVol_WetSlow 0.207
Wood Large Wood Frequency: Wetted LWFreq_Wet 0.199
Wood Large Wood Volume: Bankfull LWVol_Bf 0.189
Wood Large Wood Volume: Wetted Fast Turbulent LWVol_WetFstTurb 0.187
Wood Large Wood Frequency: Bankfull LWFreq_Bf 0.178
Wood Large Wood Volume: Bankfull Fast NonTurbulent LWVol_BfFstNT 0.175
Wood Large Wood Volume: Wetted LWVol_Wet 0.166
Wood Large Wood Volume: Wetted Fast NonTurbulent LWVol_WetFstNT 0.159

Figures

Barplots of MIC statistics, faceted by habitat category.

Figure 1: Barplots of MIC statistics, faceted by habitat category.

Barplot of MIC statistics, colored by habitat category.

Figure 2: Barplot of MIC statistics, colored by habitat category.

Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.Correlation plots of metrics, facted by habitat category.

Figure 3: Correlation plots of metrics, facted by habitat category.

Correlation plot of habitat metrics used in QRF model.

Figure 4: Correlation plot of habitat metrics used in QRF model.

Pairs plot of habitat metrics used in QRF model with a correlation coefficient greater than 0.5.

Figure 5: Pairs plot of habitat metrics used in QRF model with a correlation coefficient greater than 0.5.